Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leap: k8s 1.25, node sharing setup, no more pre-pulling, no maintenance notice #2337

Merged

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented Mar 12, 2023

  • Addresses all but the "dask-gateway options" section of Transition LEAP cloud infra to use shared nodes #2209
  • Since I was about to recreate all node pools to get highmem machines, I wanted to upgrade k8s to 1.25 while I was at it. But, this required support from our terraform scripts that I added in 64da264.
  • In a93926a I disable prePuller with a motivation in the commit message

Already deployed

This is already deployed and the terraform changes are applied together with the profileList changes that are tightly coupled with them. Due to that, it is important that this is merged before another PR makes a redeploy of the leap hub by accident with an old profileList.

This was an optimization for a workshop where I assume nodes were
pre-started and made sense for that, but would slow down startup on new
nodes if not this specific image was requested on a non-pre-warmed node
after the workshop.

Since the workshop has ended, we can safely remove this now.

Related tickets:
- Workshop: https://2i2c.freshdesk.com/a/tickets/349
- Slow startup discussed: https://2i2c.freshdesk.com/a/tickets/435
@github-actions
Copy link

github-actions bot commented Mar 12, 2023

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
gcp leap No Yes Following helm chart values files were modified: common.values.yaml

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp leap prod Following helm chart values files were modified: common.values.yaml

@consideRatio consideRatio requested a review from a team March 12, 2023 15:57
Copy link
Contributor

@pnasrat pnasrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an explanation in here or #2209 about removing the pre-puller can you just explain why that's part of this change (or link me to the appropriate bug). Is there any impact to community startup time with that removed?

Everything else looks fine and I'll approve once you've responded

@consideRatio
Copy link
Member Author

Ah I forgot to surface this in the PR description, there is this git commit message about it:

leap: stop pulling the tensorflow image (slows down other startups)

This was an optimization for a workshop where I assume nodes were
pre-started and made sense for that, but would slow down startup on new
nodes if not this specific image was requested on a non-pre-warmed node
after the workshop.

Since the workshop has ended, we can safely remove this now.

Related tickets:

Copy link
Contributor

@pnasrat pnasrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - yeah if you go from the PR straight to Files Changed for review you don't necessarily see commit messages.

LGTM

@consideRatio
Copy link
Member Author

Thank you for the quick reviews @pnasrat!!

I opened #2344 about this, we lack a policy about this configuration overall still.

@consideRatio consideRatio merged commit de1b067 into 2i2c-org:master Mar 13, 2023
@github-actions
Copy link

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/4405036988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants